DEDIS: Distributed Exact Deduplication for Primary Storage Infrastructures
نویسندگان
چکیده
Deduplication is now widely accepted as an efficient technique for reducing storage costs at the expense of some processing overhead, being increasingly sought in primary storage systems [7, 8] and cloud computing infrastructures holding Virtual Machine (VM) volumes [2, 1, 5]. Besides a large number of duplicates that can be found across static VM images [3], dynamic general purpose data from VM volumes allows space savings from 58% up to 80% if deduplicated in a clusterwide fashion [1, 4]. However, some of these volumes persist latency sensitive data which limits the overhead that can be incurred in I/O operations. Therefore, this problem must be addressed by a cluster-wide distributed deduplication system for such primary storage volumes. Although considerable space savings are obtainable, storage latency is critical in primary workloads so, deduplication must introduce negligible overhead to be viable. Traditional in-line deduplication includes computation inside the storage write path, adding unacceptable overhead in the latency of primary storage writes [6]. This penalty can be reduced by exploring data locality, however, it is only viable for specific storage workloads [5, 7]. On the other hand, off-line deduplication decouples aliasing from storage requests, reducing the latency penalty, but requiring additional temporary storage space and increasing the concurrency in storage accesses. In fact, given the overhead of copy-on-write mechanisms needed to avoid corrupting aliased data [1], even off-line deduplication must be confined to off-peak periods in order not to degrade latency. Off-peak periods in cloud infrastructures may be scarce, therefore
منابع مشابه
Distributed Exact Deduplication for Primary Storage Infrastructures
Deduplication of primary storage volumes in a cloud computing environment is increasingly desirable, as the resulting space savings contribute to the cost effectiveness of a large scale multi-tenant infrastructure. However, traditional archival and backup deduplication systems impose prohibitive overhead for latency-sensitive applications deployed at these infrastructures while, current primary...
متن کاملDEDIS: Exact Deduplication for Primary Distributed Storage∗
The removal of duplicate data from primary storage volumes in a cloud computing environment is increasingly desirable, as the resulting space savings contribute to the cost effectiveness of a large scale multi-tenant infrastructure. However, traditional archival and backup deduplication systems are not suited for large scale virtualized infrastructures and the I/O demanding applications there d...
متن کاملHPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud
Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use postprocessing deduplication running in system idle time to avoid the negative impact on I/O perform...
متن کاملA Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems
Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a sys...
متن کاملCloud Based Data Deduplication with Secure Reliability
IJRAET Abstract— To eliminate duplicate copies of data we use data de-duplication process. As well as it is used in cloud storage to minimize memory space and upload bandwidth only one copy for every file stored in cloud that can be used by more number of users. Deduplication process helps to improve storage space. Another challenge of privacy for sensitive data also arises. The aim of this pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013